The UPV Handwriting Recognition and Translation System for OpenHaRT 2013
نویسندگان
چکیده
The NIST Open Handwriting Recognition and Translation Evaluation 2013 (NIST OpenHaRT’13) is a performance evaluation assessing technologies that transcribe and translate text in document images. This evaluation is focused on recognizing Arabic text images and translating them into English. A Handwriting Recognition and Translation system typically consists of a combination of two systems: a Text Recognition system and a Machine Translation system. In this paper, we present the UPV participation in the NIST OpenHaRT 2013 evaluation. For the Text Recognition system we used the TL toolkit for training and recognition. For the Machine Translation system we used the Moses toolkit for training and decoding. Results in this evaluation are challenging and they significantly outperform our previous results in the OpenHaRT 2010 evaluation. Keywords—NIST OpenHaRT, Arabic HTR, Bernoulli HMM, Sliding Window, Repositioning
منابع مشابه
Openhart 2013 Evaluation: Description of the Litis Handwriting Recognition System
In this paper, we present the Arabic handwriting recognition system that was submitted to the 2013 NIST Open Handwriting Recognition and Translation Evaluation (OpenHaRT 2013). Our baseline recognition system is based on Hidden Markov Models and we also propose a lattice-based framework to combine the outputs from several different recognition engines. Keywords—Document recognition, Arabic hand...
متن کاملCITlab ARGUS for Arabic Handwriting
In recent years, it has been shown that multidimensional recurrent neural networks (MDRNN) perform very well in offline handwriting recognition problems like the OpenHaRT 2013 Document Image Recognition (DIR) task. With suitable writing preprocessing and dictionary lookup, our ARGUS software completed this task with an error rate of 26.27% in its primary setup. Keywords—handwriting recognition,...
متن کاملLinguistic Resources for Handwriting Recognition and Translation Evaluation
We describe efforts to create corpora to support development and evaluation of handwriting recognition and translation technology. LDC has developed a stable pipeline and infrastructures for collecting and annotating handwriting linguistic resources to support the evaluation of MADCAT and OpenHaRT. We collect handwritten samples of pre-processed Arabic and Chinese data that has been already tra...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملArabic Recognition and Translation System
To our knowledge, there are only few systems that are able to automatically translate handwritten text images into another language, in particular, Arabic. Typically, the available systems are based on a concatenation of two systems: a Handwritten Text Recognition (HTR) system and a Machine Translation (MT) system. Roughly speaking, in the case of recognition of Arabic text images, our work has...
متن کامل